AITopics | language model implicitly learn

Collaborating Authors

language model implicitly learn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language.

Neural Information Processing SystemsDec-26-2025, 07:21:02 GMT

Predicting upcoming events is critical to our ability to effectively interact with ourenvironment and conspecifics. In natural language processing, transformer models,which are trained on next-word prediction, appear to construct a general-purposerepresentation of language that can support diverse downstream tasks. However, westill lack an understanding of how a predictive objective shapes such representations.Inspired by recent work in vision neuroscience Hénaff et al. (2019), here we test ahypothesis about predictive representations of autoregressive transformer models.In particular, we test whether the neural trajectory of a sequence of words in asentence becomes progressively more straight as it passes through the layers of thenetwork. The key insight behind this hypothesis is that straighter trajectories shouldfacilitate prediction via linear extrapolation. We quantify straightness using a 1-dimensional curvature metric, and present four findings in support of the trajectorystraightening hypothesis: i) In trained models, the curvature progressively decreasesfrom the first to the middle layers of the network.

predictive representation, straighten neural sentence trajectory, trajectory, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language.

Neural Information Processing SystemsJan-19-2025, 13:37:56 GMT

predictive representation, straighten neural sentence trajectory, trajectory, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language

Hosseini, Eghbal A., Fedorenko, Evelina

arXiv.org Artificial IntelligenceNov-5-2023

Predicting upcoming events is critical to our ability to interact with our environment. Transformer models, trained on next-word prediction, appear to construct representations of linguistic input that can support diverse downstream tasks. But how does a predictive objective shape such representations? Inspired by recent work in vision (Henaff et al., 2019), we test a hypothesis about predictive representations of autoregressive transformers. In particular, we test whether the neural trajectory of a sentence becomes progressively straighter as it passes through the network layers. The key insight is that straighter trajectories should facilitate prediction via linear extrapolation. We quantify straightness using a 1-dimensional curvature metric, and present four findings in support of the trajectory straightening hypothesis: i) In trained models, the curvature decreases from the early to the deeper layers of the network. ii) Models that perform better on the next-word prediction objective exhibit greater decreases in curvature, suggesting that this improved ability to straighten sentence trajectories may be the driver of better language modeling performance. iii) Given the same linguistic context, the sequences that are generated by the model have lower curvature than the actual continuations observed in a language corpus, suggesting that the model favors straighter trajectories for making predictions. iv) A consistent relationship holds between the average curvature and the average surprisal of sentences in the deep model layers, such that sentences with straighter trajectories also have lower surprisal. Importantly, untrained models do not exhibit these behaviors. In tandem, these results support the trajectory straightening hypothesis and provide a possible mechanism for how the geometry of the internal representations of autoregressive models supports next word prediction.

language model implicitly learn, predictive representation, straighten neural sentence trajectory, (1 more...)

arXiv.org Artificial Intelligence

2311.0493

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Itzhak, Itay, Levy, Omer

arXiv.org Artificial IntelligenceAug-25-2021

Standard pretrained language models operate on sequences of subword tokens without direct access to the characters that compose each token's string representation. We probe the embedding layer of pretrained language models and show that models learn the internal character composition of whole word and subword tokens to a surprising extent, without ever seeing the characters coupled with the tokens. Our results show that the embedding layer of RoBERTa holds enough information to accurately spell up to a third of the vocabulary and reach high average character ngram overlap on all token types. We further test whether enriching subword models with additional character information can improve language modeling, and observe that this method has a near-identical learning curve as training without spelling-based enrichment. Overall, our results suggest that language modeling objectives incentivize the model to implicitly learn some notion of spelling, and that explicitly teaching the model how to spell does not enhance its performance on such tasks.

information, probe, spellingbee, (13 more...)

arXiv.org Artificial Intelligence

2108.11193

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback